05. Execution Plan

Execution Plan

ND004 C01 L02 05 Execution Plan

Join Review

On a joined select query that joins a vehicles table and a drivers table, the execution plan traverses the vehicles table and encounters a row with a driver_id that does not have a matching record in the foreign ( drivers ) table. If it is a/an _ join query, then the execution plan would skip that row and not add it to the query's output.

Fill in the blank above.

SOLUTION: inner

Let's check out the execution plan!

In SQLFiddle , you'll notice a link that says "View Execution Plan" after running a SQL command (see the green bar near the bottom).

View the execution plan and generally be familiar with what is happening to the database under the hood. The execution plan gives important insight into the performance of the query. Since there can multiple ways of performing queries against a database with various performance tradeoffs, it's important to choose the one with the best execution plan for quickly and efficiently returning the results that you need.

In SQLFiddle, for a particular result, click on the "View Execution Plan" link, and under Query Plan, click on "View on Depesz".

For a SELECT * operation, the most basic operation is Seq Scan ("Sequential Scan"), as explained here on depesz.com , where PostgreSQL opens the file storing the table, then reads each rows, one by one, returning them to user. We should generally know that this is what happens when we do a SELECT statement.

Let's look at something more complicated. For running a join,

SELECT make, model from vehicles
JOIN drivers on vehicles.driver_id = drivers.id;

The execution plan looks like this :

  1. Hash Join : joins two record sets. It is the most expensive part of the plan, as indicated by the 'cost', it is joining every row! (Is that necessary? Can we accomplish finding out what we need while devising an execution plan that doesn't require this?) The hash join creates a hash in-memory that hashes based the driver_id column.
  2. Seq Scan : a sequential scan is done across the entire vehicles table. This makes sense since we're looking to fetch all make and model information across all records in the vehicles table.
  3. Hash with Seq Scan on drivers: as the sequential scan continues, the join key is checked in the Hash returned from Step 1, where if it does NOT exist, given that this is an Inner Join, we ignore that row, and if it does exist (a record was found that does intersect between the vehicles and drivers tables), then we fetch the row from the hash to generate the outputted, joined row.

We can always gain visibility over how performant a given SQL query is by looking at its execution plan.

On Performance

Learning how to write efficient queries is practically its own field. There are techniques for improving the performance of SQL queries to consider, we can use critical indexes to speed up information lookups , and there are helpful utilities like SQL views for splitting queries into subroutines.

It won't be necessary to understand how to optimize SQL queries to complete the rest of this course. However, you may want to look particularly into SQL Indexes and generally have a performance-oriented mindset with regards to writing efficient queries. I recommended reading and following the blog, Use the Index, Luke! as a highly in-depth guide to database performance for developers.

How important is it to know about the performance of your SQL queries? Well, would you want to visit a site that takes 4 seconds to load your results ( every time you do a search), or a half of a second? You tell me…

Knowing and applying performance strategies to a SQL query can wound up being really powerful .


Now for some fun resources on Hashes: